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Abstract 

Background: Long non-coding RNAs (IncRNAs) are emerging as important regulators of cell physiology, but it is 
yet unknown to what extent IncRNAs have evolved to be targeted by microRNAs. Comparative genomics has 
previously revealed widespread evolutionarily conserved microRNA targeting of protein-coding mRNAs, and here 
we applied a similar approach to IncRNAs. 

Findings: We used a map of putative microRNA target sites in IncRNAs where site conservation was evaluated 
based on 46 vertebrate species. We compared observed target site frequencies to those obtained with a random 
model, at variable prediction stringencies. While conserved sites were not present above random expectation in 
intergenic IncRNAs overall, we observed a marginal over-representation of highly conserved 8-mer sites in a small 
subset of cytoplasmic IncRNAs (12 sites in 8 IncRNAs at 56% false discovery rate, P = 0.10). 

Conclusions: Evolutionary conservation in IncRNAs is generally low but patch-wise high, and these patches could, 
in principle, harbor conserved target sites. However, while our analysis efficiently detected conserved targeting of 
mRNAs, it provided only limited and marginally significant support for conserved microRNA-lncRNA interactions. We 
conclude that conserved microRNA-lncRNA interactions could not be reliably detected with our methodology. 
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Findings 

Background 

While small non-coding RNAs, such as microRNAs, 
have well-established functions in the cell, long non- 
coding RNAs (IncRNAs) have only recently started to 
emerge as widespread regulators of cell physiology [1]. 
Although early examples were discovered decades ago, 
large-scale transcriptomic studies have since revealed 
that mammalian genomes encode thousands of long 
(>200 nt) transcripts that lack coding capacity, but are 
otherwise mRNA-like [2-4]. Their biological importance 
has been controversial, but novel functional IncRNAs 
with roles, for example, in vertebrate development [5], 
pluripotency [6] and genome stability [7] are now being 
described at increasing frequency. 

A few recent studies describe interactions between small 
and long non- coding RNAs, where IncRNAs act either as 
regulatory targets of microRNA-induced destabilization 
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[8,9] or as molecular decoys of microRNAs [10-13]. Re- 
cent results also show that stable circular IncRNAs can 
bind and inhibit microRNAs [14,15]. Importantly, RNAi- 
based studies, including silencing of 147 IncRNAs with 
lentiviral shRNAs [6], show that IncRNAs are, in principle, 
susceptible to repression by Argonaute- small RNA com- 
plexes, despite often localizing to the nucleus. In addition, 
there are data from crosslinking and immunoprecipitation 
(CLIP) experiments that support binding of Argonaute 
proteins to IncRNAs [16,17]. 

Comparative genomics has revealed that most protein- 
coding genes are under conserved microRNA control: 
conserved microRNA target sites are present in 3' un- 
translated regions (UTRs) of protein-coding mRNAs at 
frequencies considerably higher than randomly expected, 
clearly demonstrating the impact of microRNAs on 
mRNA evolution [18,19]. While IncRNAs in general are 
weakly conserved, they may have local patches of strong 
sequence conservation [20]. It was recently shown that de- 
velopmental defects caused by knockdown of IncRNAs in 
zebrafish could be rescued by introduction of putative hu- 
man orthologs identified based on such short patches [5], 



O© 2013 Alaei-Mahabadi and Larsson; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of 
BiolVlGCl C6ntTcll the Creative Commons Attribution License (http://creativecommons.Org/licenses/by/2.0), which permits unrestricted use, 
distribution, and reproduction in any medium, provided the original work is properly cited. 



Alaei-Mahabadi and Larsson Silence 2013, 4:4 
http://www.silencejournal.eom/content/4/1/4 



Page 2 of 5 



supporting that IncRNA functions may be conserved 
over large evolutionary distances despite limited se- 
quence similarity. It is thus plausible that IncRNAs also 
have evolved to be targeted by microRNAs despite their 
overall low conservation, and that this would manifest 
itself through the presence of target sites in local con- 
served segments. 

Results 

We used our previously described pipeline to map and as- 
sess the evolutionarily conservation of putative microRNA 
target sites in IncRNAs [21]. Briefly, we mapped comple- 
mentary matches to established microRNA seed families 
in the GENCODE v7 IncRNA annotation, which was re- 
cently characterized in detail by the ENCODE consortium 
[4]. Conservation levels were determined based on a 
46-vertebrate multiple sequence alignment [22], and sites 
were scored based on their presence in primates, mam- 
mals and non-mammal vertebrates. This allowed us to 
vary the stringency to consider progressively smaller sets 
of transcripts with higher conservation levels. We com- 
pared observed site frequencies to expected frequencies 
based on a random dinucleotide model, in protein-coding 
genes and in subsets of IncRNAs (Figure 1). 

Our analysis revealed widespread presence of con- 
served target sites in mRNAs, which recapitulates previ- 
ous observations and establishes our methodology 
[18,19]. Depending on prediction stringency (conserva- 
tion level and seed type), seed complementary matches 
to conserved microRNA families were present at up to 
6.1x the expected frequency in 3' UTRs, and 1.4x in 
coding regions (Figure 2A). Sites for non-conserved 
microRNA families, which were included as a negative 
control, were observed only at expected frequencies 
(Figure 2A). 

Next, we investigated site frequencies in IncRNAs, spe- 
cifically of the intergenic type to avoid confounding gen- 
omic overlaps. In a set of 2,121 intergenic IncRNA 
genes, we observed no significant enrichment of sites 



(Figure 2B). Restricting our search to 3' or 5' ends of 
transcripts, or subsets of intergenic IncRNAs previously 
found to have conserved promoter regions [4], resulted 
in a similar lack of enrichment (data not shown). 

Many described IncRNAs participate in the assembly 
of riboprotein complexes in the nucleus [1], while 
microRNAs are considered to be active primarily in the 
cytoplasm. We used subcellular RNA-seq data to narrow 
down our analysis to a smaller set of cytoplasmic 
IncRNAs (n = 169), which were also expressed at com- 
paratively high levels (Figure 2B). Pan-mammalian con- 
served high-quality (8-mer) sites were here observed at 
1.8x the expected frequency (P = 0.10), which corre- 
sponds to a false discovery rate of 56%, but the number 
of targets and sites was small (12 sites in 8 IncRNA 
genes, Table 1). One of the eight target IncRNAs 
(ACO 1009 1.1) showed distant homology to human pro- 
tocadherin Fat 4 protein (maximum 36% identity over 94 
a.a.), and could thus represent an ancient pseudogene or 
misclassified coding gene. All others lacked homology to 
any of 565,000+ known sequences in UniProtKB/Swiss- 
Prot, and seven out of eight were also classified as long 
non-coding in a recent RNA-seq-based mapping of 
human IncRNAs [3]. 

Conserved targeting of IncRNAs by microRNAs is 
plausible, given that LncRNAs are susceptible to AGO- 
mediated repression, and that they show patch-wise 
strong sequence conservation. However, our analysis in- 
dicates that this is not a widespread phenomenon, even 
though a small subset of cytoplasmic transcripts showed 
a weak enrichment of conserved sites at marginal statis- 
tical significance. LncRNAs are currently defined solely 
based on length and coding capacity, and are as such 
likely to represent a highly functionally diverse group. It 
is thus possible that other, not yet defined, subfamilies 
have evolved to be microRNA targets, but that this 
signal is too diluted to be detectable in our current 
analysis. 

It should be noted that the GENCODE annotation 
used here is one of several published IncRNA sets, and 
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Figure 1 Workflow to detect conserved microRNA targeting of long non-coding RNAs (IncRNAs). Conserved microRNA target sites 
(complementary seed matches) were identified in the GENCODE human gene annotation based on a 46-species multiple sequence alignment as 
described previously [21]. A total of 1,267 microRNA families were considered. Different subsets of IncRNAs were analyzed for over-representation 
of sites compared to a random background model. 
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Figure 2 Ratios between observed and expected microRNA target site frequencies in coding genes and long non-coding RNAs 
(IncRNAs). (A) Our methodology was first established on coding genes. The 3' untranslated regions (UTRs) and coding sequences (CDS) were 
analyzed separately. We compared observed numbers of seed matches (in parentheses) to randomly expected numbers based on sets of 
synthetic seeds that preserved the dinucleotide frequencies of the actual seeds. Different prediction stringencies (site conservation level and 
seed quality) were applied, further explained within gray boxes. The analysis focused on highly conserved microRNA families (n = 87), but 
non-conserved families were included as a control. Bars show mean observed-to-expected ratios from 20 repeated trials. (B) Similar analysis 
based on intergenic IncRNAs and cytoplasmic intergenic IncRNAs. Placental mammal conserved 8-mer sites were present above expectation in 
a small subset of cytoplasmic intergenic IncRNAs (12 sites for 1 1 microRNA families, in 8 IncRNA genes). Subcellular localization was determined 
based on RNA-seq libraries from seven fractionated cell lines. *, empirical P <0.05 for ratio being greater than 1; (*), P = 0.10; n/a, observed 
counts to low. 



while comprehensive, it does not cover all known tran- 
scribed loci [3]. Likewise, there are several approaches to 
target site prediction and detailed results may vary. Not- 
ably, our analysis was designed to capture an overall sig- 
nature of conserved targeting, and when applied to 
mRNAs it efficiently recapitulated a strong enrichment 
signal. Different implementations and annotations could 
give variable results at the level of individual transcripts 
and sites, but the main conclusion is unlikely to depend 
on these parameters. 

While some established microRNA-lncRNA inter- 
action sites are conserved to various extents, in principle 
enabling detection by comparative genomics approaches 
[8-10], others lack conservation despite having experi- 
mentally confirmed functions [12,13]. This is consistent 



with data showing that many non-conserved human 
microRNA sites can mediate targeting [23]. Notably, 
even well-characterized IncRNAs, such as HOTAIR and 
XIST, have often evolved rapidly, and may show consid- 
erable functional and structural differences within the 
mammalian lineage [24,25]. Our comparative genomics 
methodology therefore does not exclude that non- 
conserved and recently evolved targeting could be com- 
monplace, and this motivates further computational and 
experimental studies. 

Methods 

We relied on the GENCODE coding/non-coding classifi- 
cation, and considered as IncRNAs genes that only pro- 
duced transcripts of the antisense', lincRNA, non_coding' 
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Table 1 Pan-mammalian conserved 8-mer putative microRNA target sites in cytoplasmic intergenic long non-coding 
RNAs (IncRNAs) 



Target GENCODE 
in 


Target 
symbol 


MicroRNA family 


Site 
chromosome 


Site genome 
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Cabili et al. 

Kit /-DM A a 
linCKINA 
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dLAj 1 


JVDUUUUUZZDOJD. I 
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1 1 lln 1 OZ 


chr2 






Kin hitc 
l\IU 1 II Lb 


ENSG00000231 532.1 
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miR-22/22-3p 
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AC01 0091.1 


miR-1 33abc 
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ENSG00000233491.2 
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T 


T 
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RP11- 
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chrl 
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ENSG00000245017.1 


AC013418.2 


miR-1 38/1 38ab 


chr12 
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ENSG00000248927.1 


CTD- 
2334D19.1 


miR-1 35ab/135a-5p 


chr5 


120126269 


Yes 


No hits 


ENSG00000248927.1 


CTD- 
2334D19.1 


miR-1 9ab 


chr5 


120126442 


t 


t 


ENSG00000250366.1 


Ail 33167.1 


miR-218/218a 


chr14 


96389499 


Yes 


No hits 


ENSG00000253507.1 


CTD- 
2501 M5.1 


miR-1 46ac/146b-5p 


chr8 
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No 
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Annotated as a long non-coding RNA in Cabili MN, Trapnell C et al., Genes and Development (201 1). 

b Hits with BLAST E-value <0.5. Repeat masking was performed to avoid matches to, for example, translated SINEs in SwissProt. 
Genomic coordinates refer to the Hg19 assembly. 



and processed^ranscripf types. We excluded pseu- 
dogenes, as well as any gene producing any splice isoform 
shorter than 200 nt Genes with symbols corresponding 
to any RefSeq coding gene, or to the UCSC browser 
xenoRefGene set, were removed from the long non- 
coding set, to control for a small number of cases of obvi- 
ous incorrect coding/non-coding classification in the 
GENCODE annotation. This resulted in set of 13,751/ 
9,122 IncRNA transcripts/genes. A smaller subset of 
2,121/2,777 intergenic IncRNA genes /transcripts were 
stringently defined by requiring a genomic separation of at 
least 10 kb to any other annotated gene. 

MicroRNA target sites in GENCODE v7 genes were 
mapped as described previously [21]. Random seed 
sequences were generated under a dinuclotide model 
that preserved nucleotide frequencies of the actual 
microRNA family seeds, and were subsequently mapped 
in the same way as the actual seed sequences. Ratios of 
observed-to-expected site counts were calculated based 
on these random seeds, for different conservation level 
thresholds and seed match types. To assess the statis- 
tical significance of these ratios, 20 sets of random seeds 
were evaluated, each set being of the same size as the 
set of actual conserved families (n = 87). At least 19/20 
cases of ratio >1 were required for significance at the 
empirical P <0.05 level, and 18/20 for P = 0.10. 
MicroRNA family definitions and conservation classifi- 
cations were derived from TargetScan [18]. We used 



data from a previous study [4] to define subsets of 
IncRNAs with conserved regulatory regions. The 500 or 
250 most conserved intergenic IncRNAs based on either 
pan-mammal or pan-vertebrate promoter conservation 
scores (in total, four sets) were analyzed as described 
above. 

RNA-seq data (fastq files) produced within the 
ENCODE project [26] by the Gingeras laboratory (Cold 
Spring Harbor Laboratories, Cold Spring Harbor, NY, 
USA) were obtained through the UCSC FTP server. A 
total of 1.71 billion 76 nt read pairs from polyA+ nu- 
clear and cytoplasmic fractions from seven human cell 
lines (Gml2878, HelaS3, HepG2, Huvec, Hlhesc, Nhek 
and K562) were aligned to the human hgl9 reference 
genome with Tophat [27]. The aligner was supplied 
with GENCODE gene models using the -G option. 
Genes were quantified using the HTSeq-count utility 
(http://www-huber.embl.de/users/anders/HTSeq). Cyto- 
plasmic transcripts were defined as having a normalized 
cytoplasm/nucleus ratio >1. A total of at least 20 
mapped reads across all conditions was required, to 
avoid unreliable cytoplasm/nuclear ratios in the low- 
abundance range. 

Ethical approval or patient consent was not required 
for this study. 
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